-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Add function metadata ability to push down struct argument in optimizer #25175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This pull request was exported from Phabricator. Differential Revision: D74738214 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Can you add a test? Maybe to TestHiveLogicalPlanner that the subfields are being pushed down and then a query correctness test (see TestLambdaSubfieldPruning for examples of tests for a related feature). You may have to register a function in the test that uses the new field you added (e.g. a passthrough function that takes a row and returns the row unchanged) to exercise your code.
@@ -33,4 +33,6 @@ | |||
boolean deterministic() default true; | |||
|
|||
boolean calledOnNullInput() default false; | |||
|
|||
int pushdownSubfieldArgIndex() default -1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This interface method, and the ones on ScalarFunction
and SqlInvokedScalarFunction
, are parameters to an annotation. Does a user implementing a function using the Presto SPI manually specify this value in the annotation? If not, then these should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes they are specified in the annotation. If it is not specified, then it is -1 by default
Annotation looks like this
@CodegenScalarFunction(value = "function_name", calledOnNullInput = true, pushdownSubfieldArgIndex = 0)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add some documentation on how this needs to be used to our documentation?
Also, why not add an annotation to the argument itself? Something like, @RowMayBeDereferenced
. (BTW, can you give an example of how a function could know that this is safe to do?) You could annotate multiple of them, and we we could validate that the argument is, indeed, a struct to begin with.
Also, your example reference internal queries, can you add pastes of the explain plans?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add some tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of using annotation on the argument, I have chosen to pass the argindex in the codegen decorator because this allows it to be inside the FunctionMetadata.
I added some tests to TestHiveLogicalPlanner. One more change I will make is to perform some validation that the argIndex specified does correspond to a rowtype. And throw a warning when the code path is not reached due to invalid index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tdcmeehan @rschlussel
Looking into the ComplexTypeDescriptor approach. I think it requires some changes to FunctionExtractor. reshape function is registered through a plugin. Currently, the Plugin interface only supports annotated function definitions.
Tried adding this line in FunctionExtractor, but this apporach does not seem ideal. It also assumes that the instance of SqlScalarFunction only has one constructor and it's an empty constructor. The way that other SqlScalarFunction get registered is inside BuiltInTypeAndFunctionNamespaceManager. And they are all BuiltInFunction because SqlScalarFunction is extension of BuiltInFunction.
if (SqlFunction.class.isAssignableFrom(clazz)) {
try {
return new FunctionListBuilder()
.function((BuiltInFunction) clazz.getConstructor().newInstance())
.getFunctions();
}
catch (Exception e) {
throw new RuntimeException(format("Error adding BuiltInFunction %s", e.getMessage()));
}
}
I also don't see any examples of SqlScalarFunction added through plugin. If you look at any Plugin implementation, the getFunctions override only adds functions that are annotated, most of them are with @ScalarFunction. Take for example I18nFunctionsPlugin.
I'm wondering if you two have any other thoughts on this approach and other things we need to consider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that ComplexTypeFunctionDescriptor would be the best place for it. However, SqlScalarFunction is in presto-main-base. You can't use it from a plugin. You need a way to generate it from an annotation.
What if we added the ability to specify the complexTypeFunctionDescriptor from an annotation. like what was removed in this commit d6e008c (and added earlier in the same PR). I can't remember why it was removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the problem now. Plugin-registered functions must use annotations and this function is registered through the Plugin SPI, which means we can't incorporate the feedback above directly (for example, from within a Plugin you won't even be able to subclass SqlScalarFunction
).
So taking a step back, the main thing I'm not in favor of in this PR is adding pushdownSubfieldArgIndex
to our scalar function annotations (going back to scope of API addition vs. what we get out of it). We also already use ComplexTypeFunctionDescriptor
to instruct PushdownSubfields
on how to interact with complex types, but these all apply to maps and arrays, and they're functions that are built-in and can use the verbose SqlScalarFunction
definition method above because they're defined in the main module.
I can think of a few of ways of going about it.
- Trino has a
FunctionProvider
SPI that lets you lazily build scalar functions, that could work to allow us to verbosely construct this internal function from within the SPI. (Similarly, we could allowSqlScalarFunction
to annotate a no-arg static factory method which returns a concrete instance ofSqlFunction
). - We could add an annotation that allows one to define companion information in the function, such as
ComplexTypeFunctionDescriptor
. Or we could add a separate annotation that allows the fields inComplexTypeFunctionDescriptor
to be specified independently, though this would be awkward at the method level, and I think it would be more appropriate at the argument level.
I'm also wondering, looking at ComplexTypeFunctionDescriptor
, if the functionality we're adding here can already be accomplished by the code in this class, specifically the outputToInputTransformationFunction
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be in favor of restoring an annotation, but I think the one above was mostly a 1:1 copy of ComplexTypeFunctionDescriptor
and can be improved (I don't think it should have a StaticMethodPointer
that references internal engine classes, and argumentIndicesContainingMapOrArray
should just be reworked to be per-argument).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, so i think the next steps here are
1.Add the new passthrough field to ComplextTypeFunctionDescriptor instead of scalar annotation. can then use it in SubfieldPushdown
2. create an annotation for creating a complexTypeFunctionDesscriptor (probably doesn't need to be fully featured at this point? can add to it as needed), add parsing ability for the annotation for generating the function metadata.
2. use the new annotation in fb_reshape_row
Suggest adding a release note, or a |
@kevintang2022 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@kevintang2022 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
2c2555f
to
2c73136
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the new tests!
} | ||
checkCondition(signature.getArgumentTypes().size() > pushdownSubfieldArgIndex, FUNCTION_IMPLEMENTATION_ERROR, "Method [%s] has out of range pushdown subfield arg index", method); | ||
String typeVariableName = signature.getArgumentTypes().get(pushdownSubfieldArgIndex).toString(); | ||
checkCondition(typeVariableName.equals(com.facebook.presto.common.type.StandardTypes.ROW) || typeConstraintMapping.get(typeVariableName).equals(com.facebook.presto.common.type.StandardTypes.ROW), FUNCTION_IMPLEMENTATION_ERROR, "Method [%s] does not have a struct or row type as pushdown subfield arg", method); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or has no typeConstraintMapping entry (maybe no constraint on the type) or typeConstraintMapping value is null and TypeVariableConstraint.nonDecimalNumericRequired is false (i.e. there's no constraint preventing row type?) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is too restrictive. The argument index is only checked for these conditions if the pushdownSubfieldArgIndex.isPresent().
Also, the typeVariableName should always be present because signature.getArgumentTypes() always contains the same number of parameters as the function signature. There is always a type constraint imposed on a positional argument.
I think you might have it confused with typeVariableConstraints. typeVariableConstraints might be empty if we don't use an alias for a type (T for RowType). typeConstraintMapping is built using typeVariableConstraints.
So my check ensures that the parameter is either directly named a "row" type, or it has a type alias that maps to a "row" type
In summary: typeVariableConstraints is retreived from this annotation: @TypeParameter(value = "T", boundedBy = ROW)
argumentTypes is retrieved from the function definition public static Block customStructWithPassthrough(@SqlType("T") Block struct)

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for something of a generic type "T" to accept a row type, but not have variadicBound "row". E.g. maybe it can be used for all types , so it doesn't have a TypeVariableConstraint, or it just requires that the types be orderable, but can be a row or other type (so variadic bound is null)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the logic desired?
// The type variable must be directly a ROW type
// or (it is a type alias that is not bounded by a type)
// or (it is a type alias that maps to a row type)
boolean meetsTypeConstraint = (!typeConstraintMapping.containsKey(typeVariableName) && typeVariableName.equals(com.facebook.presto.common.type.StandardTypes.ROW)) ||
(typeConstraintMapping.containsKey(typeVariableName) && typeConstraintMapping.get(typeVariableName).getVariadicBound() == null && !typeConstraintMapping.get(typeVariableName).isNonDecimalNumericRequired()) ||
(typeConstraintMapping.containsKey(typeVariableName) && typeConstraintMapping.get(typeVariableName).getVariadicBound().equals(com.facebook.presto.common.type.StandardTypes.ROW));
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that looks good
checkCondition(pushdownSubfieldArgIndex.get() >= 0, FUNCTION_IMPLEMENTATION_ERROR, "Method [%s] has negative pushdown subfield arg index", method); | ||
checkCondition(signature.getArgumentTypes().size() > pushdownSubfieldArgIndex.get(), FUNCTION_IMPLEMENTATION_ERROR, "Method [%s] has out of range pushdown subfield arg index", method); | ||
String typeVariableName = signature.getArgumentTypes().get(pushdownSubfieldArgIndex.get()).toString(); | ||
checkCondition(typeVariableName.equals(com.facebook.presto.common.type.StandardTypes.ROW) || typeConstraintMapping.get(typeVariableName).equals(com.facebook.presto.common.type.StandardTypes.ROW), FUNCTION_IMPLEMENTATION_ERROR, "Method [%s] does not have a struct or row type as pushdown subfield arg", method); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about maybe this is too restrictive and need to add some other conditions.
@kevintang2022 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@kevintang2022 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
4dcffbd
to
3785079
Compare
@kevintang2022 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary:
For some user defined functions, the pushdown subfield optimizer should transparently pass down utilized subfields of a struct type. The goal is to make the query plan look the same as if the udf was not being called on the struct. In order to accomplish this, the user defined function needs to take the struct argument passed into it, and unwrap it when converting an expression to a subfield.
Since there is no guarantee that the struct argument is always the first argument in the udf, the udf needs to specify which argument index to push down in its metadata.
T224244100
Presto version 0.293-20250525.210422-369
Differential Revision: D74738214
Test plan:
With this change, both of the queries below produce the same query plan after the table scan node rewrite
20250525_235045_00003_tu7a9 correct query plan with pushed down subfield
20250525_235408_00004_tu7a9 query plan with non relevant function
20250525_235845_00005_tu7a9 expected plan
Verifier suite build: 20250521_205359_71488_cm4iz
UDF only
https://www.internalfb.com/intern/presto/verifier/results/?test_id=223902
General
https://our.intern.facebook.com/intern/presto/verifier/results/?test_id=223903
Release Notes
Please follow release notes guidelines and fill in the release notes below.